23

Given ANY GitHub repository url string like:

git://github.com/some-user/my-repo.git

or

git@github.com:some-user/my-repo.git

or

https://github.com/some-user/my-repo.git

What is the best way in bash to extract the repository name my-repo from any of the following strings? The solution MUST work for all types of urls specified above.

Thanks.

Justin
  • 5,668

10 Answers10

26

I'd go with basename $URL .git.

womble
  • 98,245
26
$ url=git://github.com/some-user/my-repo.git
$ basename=$(basename $url)
$ echo $basename
my-repo.git
$ filename=${basename%.*}
$ echo $filename
my-repo
$ extension=${basename##*.}
$ echo $extension
git
quanta
  • 52,423
15

Old post, but I faced the same problem recently.

The regex ^(https|git)(:\/\/|@)([^\/:]+)[\/:]([^\/:]+)\/(.+).git$ works for the three types of URL.

#!/bin/bash

url="git://github.com/some-user/my-repo.git"

url="https://github.com/some-user/my-repo.git"

url="git@github.com:some-user/my-repo.git"

re="^(https|git)(://|@)([^\/:]+)/:/(.+)(.git)*$"

if [[ $url =~ $re ]]; then
protocol=${BASH_REMATCH[1]} separator=${BASH_REMATCH[2]} hostname=${BASH_REMATCH[3]} user=${BASH_REMATCH[4]} repo=${BASH_REMATCH[5]} fi


Explaination (see it in action on regex101):

  • ^ matches the start of a string
  • (https|git) matches and captures the characters https or git
  • (:\/\/|@) matches and captures the characters :// or @
  • ([^\/:]+) matches and captures one character or more that is not / nor :
  • [\/:] matches one character that is / or :
  • ([^\/:]+) matches and captures one character or more that is not / nor :, yet again
  • [\/:] matches the character /
  • (.+) matches and captures one character or more
  • (.git)* matches optional .git suffix at the end
  • $ matches the end of a string

This if far from perfect, as something like https@github.com:some-user/my-repo.git would match, but I think it's fine enough for extraction.

Hicham
  • 151
6

Summing up:

  • Get url without (optional) suffix:

    url_without_suffix="${url%.*}"
    
  • Get repository name:

    reponame="$(basename "${url_without_suffix}")"
    
  • Get user (host) name afterwards:

    hostname="$(basename "${url_without_suffix%/${reponame}}")"
    
hypnoglow
  • 161
1

use regular expression: /([^/]+)\.git$/

0

basename is my favorite, but you can also use sed:

url=git://github.com/some-user/my-repo.git
reponame="$(echo $url | sed -r 's/.+\/([^.]+)(\.git)?/\1/')"
# reponame = "my-repo"

"sed" will delete all text until the last / + the .git extension (if exists), and will retain the match of group \1 which is everything except dot ([^.]+)

0

Using Hitcham's awesome answer above allowed me to come up with this, using sed to output exactly what needed: org/reponame with sed.

output = echo ${git_url} | sed -nr  's/^(https|git)(:\/\/|@)([^\/:]+)[\/:]([^\/:]+)\/(.+).git$$/\4\/\3/p'`

Works well in ubuntu, doesn't work for the sed available by default on macosx.

0

A slight modification to @Hicham's answer

^(https|git)(:\/\/|@)([^\/:]+)[\/:]([^\/:]+)\/(.+?)(\.git)?$

Will extract out the .git suffix as well.

0

After fiddling half a day in regex101 and using input from @womble and the others.. I came up with this, which also has the capture names to denote what is handled where.. It may help even me in the neer future :P

/^((?<protocol>https?|ssh|git|ftps?):\/\/)?((?<user>[^\/@]+)@)?(?<host>[^\/:]+)[\/:](?<port>[^\/:]+)\/(?<path>.+\/)?(?<repo>.+?)(?<suffix>\.git[\/]?)?$/

it basically allows to use the repo name (see part ?) in a

.../reponame.git, .../reponame.git/, .../reponame and .../reponame/

repo url, as it handles the optional .git

0
basename $git_repo_url | tr -d ".git"
Michael Hampton
  • 252,907
jit
  • 1